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(57) Abstract: Speech enhancement is provided in dual microphone noise reduction systems by including spectral subtraction al- 
gorithms using linear convolution, causal filtering and/or spectrum dependent exponential averaging of the spectral subtraction gain 
function. According to exemplar embodiments, when a far-mouth microphone is used in conjunction with a near-mouth microphone, 
it is possible to handle non-stationary background noise as long as the noise spectrum can continuously be estimated from a single 
block of input samples. The far-mouth microphone, in addition to picking up the background noise, also picks up the speakers 
voice, albeit at a lower level than the near-mouth microphone. To enhance the noise estimate, a spectral subtraction stage is used to 
suppress the speech in the far-mouth microphone signal. To be able to enhance the noise estimate, a rough speech estimate is formed 
with another spectral subtraction stage from the near-mouth signal. Finally, a third spectral subtraction function is used to enhance 
the near-mouth signal by suppressing the background noise using the enhanced background noise estimate. A controller dynamically 
determines any or all of a first, second, and third subtraction factor for each of the first, second, and third spectral subtraction stages, 
respectively. 



3NSDOCID: <WO 0 1 5S323A 1 J_> 



WO 01/56328 



PCT/EPO 1/00468 



-1- 



SYSTEM AND METHOD FOR DUAL MICROPHONE SIGNAL 
NOISE REDUCTION USING SPECTRAL SUBTRACTION 



5 BACKGROUND 

The present invention relates to communications systems, and more 
particularly, to methods and apparatus for mitigating the effects of disruptive 
background noise components in communications signals. 

Today, technology and consumer demand have produced mobile telephones of 
10 diminishing size. As the mobile telephones are produced smaller and smaller, the 
placement of the microphone during use ends up more and more distant from the 
speaker's (near-end user's) mouth. This increased distance increases the need for 
speech enhancement due to disruptive background noise being picked up at the 
microphone and transmitted to a far-end user. In other words, since the distance 
15 between a microphone and a near-end user is larger in the newer smaller mobile 

telephones, the microphone picks up not only the near-end user's speech, but also any 
noise which happens to be present at the near-end location. For example, the near-end 
microphone typically picks up sounds such as surrounding traffic, road and passenger 
compartment noise, room noise, and the like. The resulting noisy near-end speech can 
20 be annoying or even intolerable for the far-end user. It is thus desirable that the 
background noise be reduced as much as possible, preferably early in the near-end 
signal processing chain (e.g., before the received near-end microphone signal is 
supplied to a near-end speech coder). 

As a result of interfering background noise, some telephone systems include a 
25 noise reduction processor designed to eliminate background noise at the input of a 
near-end signal processing chain. Figure 1 is a high-level block diagram of such a 
system 100 . In Figure 1 , a noise reduction processor 1 10 is positioned at the output of 
a microphone 120 and at the input of a near-end signal processing path (not shown). In 
operation, the noise reduction processor 110 receives a noisy speech signal x from the 
30 microphone 120 and processes the noisy speech signal x to provide a cleaner, noise- 
reduced speech signal Snr which is passed through the near-end signal processing chain 
and ultimately to the far-end user. 
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One well known method for implementing the noise reduction processor 1 10 of 
Figure 1 is referred to in the art as spectral subtraction. See, for example, S.F. Boll, 
"Suppression of Acoustic Noise in Speech using Spectral Subtraction" , IEEE Trans. 
Acoust. Speech and Sig. Proc, 27:113-120, 1979, which is incorporated herein by 
reference in its entirety. Generally, spectral subtraction uses estimates of the noise 
spectrum and the noisy speech spectrum to form a signal-to-noise ratio (SNR) based 
gain function which is multiplied by the input spectrum to suppress frequencies having 
a low SNR. Though spectral subtraction does provide significant noise reduction, it 
suffers from several well known disadvantages. For example, the spectral subtraction 
output signal typically contains artifacts known in the art as musical tones. Further, 
discontinuities between processed signal blocks often lead to di minis hed speech quality 
from the far-end user perspective. 

Many enhancements to the basic spectral subtraction method have been 
developed in recent years. See, for example, N. Virage, "Speech Enhancement Based 
on Masking Properties of the Auditory System," IEEE ICASSP. Proc. 796-799 vol. 1,. 
1995; D. Tsoukalas, M. Paraskevas and J. Mourjopoulos, "Speech Enhancement using 
Psychoacoustic Criteria," IEEE ICASSP. Proc, 359-362 vol. 2, 1993; F. Xie and D. 
Van Compernolle, "Speech Enhancement by Spectral Magnitude Estimation - A 
Unifying Approach," IEEE Speech Communication, 89-104 vol. 19, 1996; R. Martin, 
"Spectral Subtraction Based on Minimum Statistics," UESIPCO, Proc, 1182-1185 vol. 
2, 1994; and S.M. McOlash, RJ. Niederjohn and J. A. Heinen, "A Spectral '[ 
Subtraction Method for Enhancement of Speech Corrupted by Nonwhite, Nonstationary 
Noise," IEEEIECON. Proc, 872-877 vol. 2, 1995. 

More recently, spectral subtraction has been implemented using correct 
convolution and spectrum dependent exponential gain function averaging. These 
techniques are described in co-pending U.S. Patent Application Serial No. 09/084,387, 
filed May 27, 1998 and entitled "Signal Noise Reduction by Spectral Subtraction using 
Linear Convolution and Causal Filtering" and co-pending U.S. Patent Application 
Serial No. 09/084,503, also filed May 27, 1998 and entitled "Signal Noise Reduction 
by Spectral Subtraction using Spectrum Dependent Exponential Gain Function 
Averaging." 
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Spectral subtraction uses two spectrum estimates, one being the "disturbed" 
signal and one being the "disturbing" signal, to form a signal-to-noise ratio (SNR) 
based gain function. The disturbed spectra is multiplied by the gain function to 
increase tifeSNKTfor this spectra. Xnr single microphone spectral-subtraction 

5 applications, such as used in conjunction with hands-free telephones, speech is 

enhanced from the disturbing background noise. The noise is estimated during speech 
pauses or wim the help of a noise model during speech. This implies that the noise 
must be stationary to have similar properties during the speech or that the model be 
suitable for the moving background noise. Unfortunately, this is not the case for most 

10 background noises in every-day surroundings. 

Therefore, there is a need for a noise reduction system which uses the 
techniques of spectral subtraction and which is suitable for use with most every-day 
variable background noises. 

!5 SUMMARY 

The present invention fulfills the above-described and other needs by providing 
methods and apparatus for performing noise reduction by spectral subtraction in a dual 
microphone system. According to exemplary embodiments, when a far-mouth 
microphone is used in conjunction with a near-mouth microphone, it is possible to 
20 handle non-stationary background noise as long as the noise spectrum can continuously 
be estimated from a single block of input samples. The far-mouth microphone, in 
addition to picking up the background noise; also picks us the speaker's voice, albeit at 
a lower level than the near-mouth microphone. To enhance the noise estimate, a 
spectral subtraction stage is used to suppress the speech in the far-mouth microphone 
25 signal. To be able to enhance the noise estimate, a rough speech estimate is formed 
Willi another spectral subtraction stage from the near-mouth signal. Finally , a third 
spectral subtraction stage is used to enhance the near-mouth signal by suppressing the 
background noise using the enhanced background noise estimate. A controller 
dynamically determines any or all of a first, second, and third subtraction factor for 
30 each of the first, second, and third spectral subtraction stages, respectively. 
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The above-described and other features and advantages of the present invention 
are explained in detail hereinafter with reference to the illustrative examples shown in 
the accompanying drawings. Those skilled in the art will appreciate that the described 
embodiments are provided for purposes of illustration and understanding and that 
numerous equivalent embodiments are contemplated herein. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a noise reduction system in which spectral 
subtraction can be implemented; 

Figure 2 depicts a conventional spectral subtraction noise reduction processor; 

Figures 3-4 depict exemplary spectral subtraction noise reduction processors 
according to exemplary embodiments of the invention; 

Figure 5 depicts the placement of near- and far-mouth microphones in an 
exemplary embodiment of the present invention; 

Figure 6 depicts an exemplary dual microphone spectral subtraction system; and 

Figure 7 depicts an exemplary spectral subtraction stage for use in an exemplary 
embodiment of the present invention. 

DETAILED DESCRIPTION 

To understand the various features and advantages of the present invention, it is 
useful to first consider a conventional spectral subtraction technique. Generally, 
spectral subtraction is built upon the assumption that the noise signal and the speech 
signal in a communications application are random, uncorrected and added together to 
form the noisy speech signal. For example, if s(n), w(n) and x(n) are stochastic short- 
time stationary processes representing speech, noise and noisy speech, respectively, 
then: 

x(n) = s(n)+w(n) (!) 
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R x (f) = R s (f>+R w (f) (2) 



where R(f) denotes the power spectral density of a random process. 
The noise power spectral density 2? w (/) can be estimated during speech pauses 
(i.e., wherein) = w(n)). To estimate the power spectral density of the speech, an 
5 estimate is formed as: 

R s (f) = R x if)-R w (f) (3) 

The conventional way to estimate the power spectral density is to use a 
periodogram. For example, ifXjfJ is the N length Fourier transform of x{n) and 
WjJfJ is the corresponding Fourier transform of w(ri), then: 



r (f \ = p (f) = — |X (f)| 2 , / = — » u=0, N-l 



(4) 



r (f\ = p (f ) = —\W (f)\ 2 , f = — , k=0, N-l 



(5) 



10 Equations (3), (4) and (5) can be combined to provide: 

IWI 2 = IWMWI 2 (6) 

Alternatively, a more general form is given by: 
where the power spectral density is exchanged for a general form of spectral density. 
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Since the human ear is not sensitive to phase errors of the speech, the noisy 
speech phase <p x (f) can be used as an approximation to the clean speech phase <p s (f): 



(8) 



A general expression for estimating the clean speech Fourier transform is thus 
5 formed as: 



(9) 



where a parameter k is introduced to control the amount of noise subtraction. 
In order to simplify the notation, a vector form is introduced: 



(10) 



10 



The vectors are computed element by element. For clarity, element by element 
multiplication of vectors is denoted herein by o. Thus, equation (9) can be written 
employing a gain function G N and using vector notation as: 



S N = G N o\xJoe J *' = G N OX N 



(11) 



where the gain function is given by: 



w 



1 - k 



(12) 



Equation (12) represents the conventional spectral subtraction algorithm and is 
illustrated in Figure 2. In Figure 2, a conventional spectral subtraction noise reduction 
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processor 200 includes a fast Fourier transform processor 210, a magnitude squared 
processor 220, a voice activity detector 230, a block-wise averaging device 240, a 
block-wise gain computation processor 250, a multiplier 260 and an inverse fast 
F6urieFtr^sfonnprocessor-270 . 
5 As shown, a noisy speech input signal is coupled to an input of the fast Fourier 

transform processor 210, and an output of the fast Fourier transform processor 210 is 
coupled to an input of the magnitude squared processor 220 and to a first input of the 
multiplier 260. An output of the magnitude squared processor 220 is coupled to a first 
contact of the switch 225 and to a first input of the gain computation processor 250. 
10 An output of the voice activity detector 230 is coupled to a throw input of the switch 
225 , and a second contact of the switch 225 is coupled to an input of the block-wise 
averaging device 240. An output of the block-wise averaging device 240 is coupled to 
a second input of the gain computation processor 250, and an output of the gain 
computation processor 250 is coupled to a second input of the multiplier 260. An 
15 ' output of the multiplier 260 is coupled to an input of the inverse fast Fourier transform 
processor 270, and an output of the inverse fast Fourier transform processor 270 
provides an output for the conventional spectral subtraction system 200. 

In operation, the conventional spectral subtraction system 200 processes the 
incoming noisy speech signal, using the conventional spectral subtraction algorithm 
20 described above, to provide the cleaner, reduced-noise speech signal. In practice, the 
various components of Figure 2 can be implemented using any known digital signal 
processing technology, including a general purpose computer, a collection of integrated 
circuits and/or application specific integrated circuitry (ASIC). 

Note that in the conventional spectral subtraction algorithm, there are two 
25 parameters, a and k, which control the amount of noise subtraction and speech quality. 
Setting the first parameter to a = 2 provides a power spectral subtraction, while setting 
the first parameter to a = 1 provides magnitude spectral subtraction. Additionally, 
setting the first parameter to a = 0.5 yields an increase in the noise reduction while 
only moderately distorting the speech. This is due to the fact that the spectra are 
30 compressed before the noise is subtracted from the noisy speech. 
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The second parameter k is adjusted so that the desired noise reduction is 
achieved. For example, if a larger k is chosen, the speech distortion increases. In 
practice, the parameter k is typically set depending upon how the first parameter a is 
chosen. A decrease in a typically leads to a decrease in the k parameter as well in 
order to keep the speech distortion low. In the case of power spectral subtraction, it is 
common to use over-subtraction (i.e. , k > 1). 

The conventional spectral subtraction gain function (see equation (12)) is 
derived from a full block estimate and has zero phase. As a result, the corresponding 
impulse response g N (u) is non-causal and has length N (equal to the block length). 
Therefore, the multiplication of the gain function G N (l) and the input signal (see 
equation (11)) results in a periodic circular convolution with a non-causal filter. As 
described above, periodic circular convolution can lead to undesirable aliasing in the 
time domain, and the non-causal nature of the filter can lead to discontinuities between 
blocks and thus to inferior speech quality. Advantageously, the present invention 
provides methods and apparatuses for providing correct convolution with a causal gain 
filter and thereby eliminates the above described problems of time domain aliasing and 
inter-block discontinuity. 

With respect to the time domain aliasing problem, note that convolution in the 
time-domain corresponds to multiplication in the frequency-domain. In other words: 

x(«) *y(u)»X(fyY(f), a = -~ .... co (13) 

When the transformation is obtained from a fast Fourier transform (FFT) of 
length N, the result of the multiplication is not a correct convolution. Rather, the 
result is a circular convolution with a periodicity of N: 

x N ® y N (14) 

where the symbol denotes circular convolution. 

In order to obtain a correct convolution when using a fast Fourier transform, 
the accumulated order of the impulse responses x„ and y^ must be less than or equal to 
one less than the block length N- 1. 
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Thus, the time domain aliasing problem resulting from periodic circular 
convolution can be solved by using a gain function G^Z) and an input signal block X N 
having a total order less than or equal to N- 1. 

A~ccbrding~to conventional-spectral subtraction^ me speeteum-X w of-the input 
5 signal is of full block length N. However, according to the invention, an input signal 
block x L of length L (L < N) is used to construct a spectrum of order L. The length L 
is called the frame length and thus x L is one frame. Since the spectrum which is 
multiplied with the gain function of length N should also be of length N, the frame x L 
is zero padded to the full block length N, resulting in X LtN . 
10 In order to construct a gain function of length N, the gain function according to 

the invention can be interpolated from a gain function G M (l) of length M, where 
M < N, to form G MtN (/). To derive the low order gain function G MtN (Z) according to 
the invention, any known or yet to be developed spectrum estimation technique can be 
used as an alternative to the above described simple Fourier transform periodogram. 
15 Several known spectrum estimation techniques provide lower variance in the resulting 
gain function. See, for example, J.G. Proakis and D.G. Manolakis, Digital Signal 
Processing; Principles, Algorithms, and Applications, Macmillan, Second Ed., 1992. 

According to the well known Bartlett method, for example, the block of length 
N is divided into K sub-blocks of length M. A periodogram for each sub-block is then 
20 computed and the results are averaged to provide an M-long periodogram for the total 
block as: 

P (f \ = IV p (f), / =— , «=0, .... M-l 

x,M « K fa *mW' J » M. - (15) 

= — E \^(x(k-M+u))\ 2 
K k=o 

Advantageously, the variance is reduced by a factor K when the sub-blocks are 
uncorrected, compared to tne full block length periodogram. The frequency resolution 
is also reduced by the same factor. 
25 Alternatively, the Welch method can be used. The Welch method is similar to 

the Bartlett method except that each sub-block is windowed by a Hanning window, and 
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the sub-blocks are allowed to overlap each other, resulting in more sub-blocks. The 
variance provided by the Welch method is further reduced as compared to the Bartlett 
method. The Bartlett and Welch methods are but two spectral estimation techniques, 
and other known spectral estimation techniques can be used as well. 
5 Irrespective of the precise spectral estimation technique implemented, it is 

possible and desirable to decrease the variance of the noise periodogram estimate even 
further by using averaging techniques. For example, under the assumption that the 
noise is long-time stationary, it is possible to average the periodograms resulting from 
the above described Bartlett and Welch methods. One technique employs exponential 
10 averaging as: 

P xM (l) = a • p M (/-i) + (i -a) ■ P x Jl) (16) 



In equation (16), the function P XiM (Z) is computed using the Bartlett or Welch 
method, the function P*, M (/) is the exponential average for the current block and the 
function P xM (/-I) is the exponential average for the previous block. The parameter 
a controls how long the exponential memory is, and typically should not exceed the 
15 length of how long the noise can be considered stationary. An a closer to 1 results in a 
longer exponential memory and a subsTantial reduction of the periodogram variance. 

The length M is. referred to as the sub-block length, and the resulting low order 
gain function has an impulse response of length M. Thus, the noise periodogram 
estimate P* l ,m(/) and the noisy speech periodogram estimate ?x l m (!) employed in 
20 the composition of the gain function are also of length M: 

( -a \ l 

G M (l) = 1-* . -J: (17) 

\ * L ' M J 

According to the invention, this is achieved by using a shorter periodogram 
estimate from the input frame X L and averaging using, for example, the Bartlett 
method. The Bartlett method (or other suitable estimation method) decreases the 
variance of the estimated periodogram, and there is also a reduction in frequency 
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resolution. The reduction of the resolution from L frequency bins to M bins means that 
the periodogram estimate P Xl .m CO is also of length M. Additionally, the variance of 
the noise periodogram estimate P^d) can be decreased farmer using exponential 



averaging as described above". 
5 To meet the requirement of a total order less than or equal to N-l, the frame 

length L, added to the sub-block length M, is made less than N. As a result, it is 
possible to form the desired output block as: 

S N = G A nw(0 © XiiN (18) 

Advantageously, the low order filter according to the invention also provides an 
opportunity to address the problems created by the non-causal nature of the gain filter 
10 in the conventional spectral subtraction algorithm (i.e. , inter-block discontinuity and 
dirmnished speech quality). SpecificaUy , according to the invention, a phase can be 
added to the gain function to provide a causal filter. According to exemplary 
embodiments, the phase can be constructed from a magnitude function and can be 
either linear phase or minimum phase as desired. 

To construct a linear phase filter according to the invention, first observe that if 
- the block length of the FFT is of length M, then a circular shift in the time-domain is a 
multiplication with a phase function in the frequency-domain: 

g in-r> M ~G H <fj'e-^f u = ±,u-0,:...M-l (19) 

- In the instant case, / equals M/2+ 1, since the first position in the impulse 
response should have zero delay (i.e., a causal filter). Therefore: 

g(„-(M/2 + l)) M ~ G M (f u ) 

20 and the linear phase filter G M (f u ) is thus obtained as 

wj» (21) 
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According to the invention, the gain function is also interpolated to a length N, 
which is done, for example, using a smooth interpolation. The phase that is added to 
the gain function is changed accordingly, resulting in: 



G (f)=G(f)'e M N (22) 

Advantageously, construction of the linear phase filter can also be performed in 
5 the time-domain. In such case, the gain function G M (fJ is transformed to the time- 
domain using an IFFT, where the circular shift is done. The shifted impulse response 
is zero-padded to a length N 9 and then transformed back using an N-long FFT. This 
leads to an interpolated causal linear phase filter G (f ) as desired. 

A causal minimum phase filter according to the invention can be constructed 
10 from the gain function by employing a Hilbert transform relation. See, for example, 
A.V. Oppenheim and R.W. Schafer, Discrete-Time Signal Processing, Prentic-Hall, 
Inter. Ed., 1989. The Hilbert transform relation implies a unique relationship between 
real and imaginary parts of a complex function. Advantageously, this can also be 
utilized for a relationship between magnitude and phase, when the logarithm of the 
15 complex signal is used, as: 

4 - ln(|G M (r H )|) + 4^ G ^) 

V / (23) 

In the present context, the phase is zero, resulting in a real function. The 
function ln(| G M (fJ |) is transformed to the time-domain employing an IFFT of length 
M, fo rmin g g M (n). The time-domain function is rearranged as: 

{2-£ M (n), n=l, 2, M/2-1 
8 M (n), . n=0, M/2 (24) 
0, /z=Af/2 + l, M-l 

The function g M (n) is transformed back to the frequency-domain using an 
20 M-long FFT, yielding ln(\G M (fj\ • / are(G From this, the function G u (f) is 
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fqrmed. The causal minimum phase filter G M (f u ) is then interpolated to a length N. 
The interpolation is made the same way as in the linear phase case described above. 
The resulting interpolated filter G MIN (/D is causal and has approximately minimum 
phase. 

5 above described spectral subtraction scheme according to the invention is 

depicted in Figure 3. In Figure 3, a spectral subtraction noise reduction processor 300, 
providing linear convolution and causal-filtering, is shown to include a Bartlett 
processor 305, a magnitude squared processor 320, a voice activity detector 330, a 
block-wise averaging processor 340, a low order gain computation processor 350, a 
0 gain phase processor 355, an interpolation processor 356, amultiplier 360, an inverse 
fast Fourier transform processor 370 and an overlap and add processor 380. 

As shown, the noisy speech input signal is coupled to an input of the Bartlett 
processor 305 and to an input of the fast Fourier transform processor 310. An output 
of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 
15 320, and an output of the fast Fourier transform processor 310 is coupled to a first 
input of the multiplier 360. An output of the magnitude squared processor 320 is 
coupled to a first contact of me switch 325 and to a first input of the low order gain 
computation processor 350 , A control output of the voice activity detector 330 is 
coupled to a throw input of the switch 325, and a second contact of the switch 325 is 
20 coupled to an input of the block-wise averaging device 340 . 

An output of the block-wise averaging device 340 is coupled to a second input 
of the low order gain computation processor 350, and an output of the low order gain 
computation processor 350 is coupled to an input of the gain phase processor 355. An 
output of the gain phase processor 355 is coupled to an input of the interpolation 
25 processor 356, and an output of the interpolation processor 356 is coupled to a second 
input of the multiplier 360. An output of the multiplier 360 is coupled to an input of 
the inverse fast Fourier transform processor 370, and an output of me inverse fast 
Fourier transform processor 370 is coupled to an input of the overlap and add 
processor 380. An output of the overlap and add processor 380 provides a reduced 
30 noise, clean speech output for the exemplary noise reduction processor 300. 
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In operation, the spectral subtraction noise reduction processor 300 processes 
the incoming noisy speech signal, using the linear convolution, causal filtering 
algorithm described above, to provide the clean, reduced-noise speech signal. In 
practice, the various components of Figure 3 can be implemented using any known 
5 digital signal processing technology, including a general purpose computer, a collection 
of integrated circuits and/or application specific integrated circuitry (ASIC). 

Advantageously, the variance of the gain function G M (Z) of the invention can be 
decreased still further by way of a controlled exponential gain function averaging 
scheme according to the invention. According to exemplary embodiments, the 
10 averaging is made dependent upon the discrepancy between the current block spectrum 
P X . M (Z) and the averaged noise spectrum P,, M (/). For example, when there is a small 
discrepancy, long averaging of the gain function G M (Z) can be provided, corresponding 
to a stationary background noise situation. Conversely, when there is a large 
discrepancy, short averaging or no averaging of the gain function G M (Z) can be 
15 provided, corresponding to situations with speech or highly varying background noise. 

In order to handle the transient switch from a speech period to a background 
noise period, the averaging of the gain function is not increased in direct proportion to 
decreases in the discrepancy, as doing so introduces an audible shadow voice (since the 
gain function suited for a speech spectrum would remain for a long period). Instead, 
20 the averaging is allowed to increase slowly to provide time for the gain function to 
adapt to the stationary input. 

According to exemplary embodiments, the discrepancy measure between spectra 
is defined as 

AO - £ - ^&^J (25) 

where /?(/) is limited by 
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Ao «- < 



l, Ao>i 

AO. ^^AO^l. 0^ mfn «l (26) 

4*. AO</L, 



and where A(0 = 1 results in no exponential averaging of the gain function, and 
AO = Am.? provides the maximum degree of exponential averaging. 

The parameter P(Z) is an exponential average of the discrepancy between 
spectra, described by 

= y-A'-D-^-YVAO (27) 

5 The parameter y in equation (27) is used to ensure that the gain function adapts 

to the new level, when a transition from a period with high discrepancy between the 
spectra to a period with low discrepancy appears. As noted above, this is done to 
prevent shadow voices. According to the exemplary embodiments, the adaption is 
finished before the increased exponential averaging of the gain function starts due to 

10 the decreased level of AO- Thus: 

0, A'-D< AO ' (28 ) 

Yc , AZ-1)*A0, 0<Y C <1 

When the discrepancy AO increases, the parameter AO follows directly, but 
when the discrepancy decreases, an exponential average is employed on AO to form 
the averaged parameter AO- The exponential averaging of the gain function is 
described by: 

G m (0=(1-A0)'Gm(/-D+A0-Gm(0 ( 29 > 



15 



The above equations can be interpreted for different input signal conditions as 
follows. During noise periods, the variance is reduced. As long as the noise spectra 
has a steady mean value for each frequency, it can be averaged to decrease the 
variance. Noise level changes result in a discrepancy between the averaged noise 
spectrum P,. M (0 and the spectrum for the current block P^O- Thus, the controlled 
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exponential averaging method decreases the gain function averaging until the noise 
level has stabilized at a new level. This behavior enables handling of the noise level 
changes and gives a decrease in variance during stationary noise periods and prompt 
response to noise changes. High energy speech often has time-varying spectral peaks. 
When the spectral peaks from different blocks are averaged, their spectral estimate 
contains an average of these peaks and thus looks like a broader spectrum, which 
results in reduced speech quality. Thus, the exponential averaging is kept at a 
min i m um during high energy speech periods. Since the discrepancy between the 
average noise spectrum P x ,a*(/) and the current high energy speech spectrum P xM (Z) is 
large, no exponential averaging of the gain function is performed. During lower 
energy speech periods, the exponential averaging is used with a short memory 
depending on the discrepancy between the current low-energy speech spectrum and the 
averaged noise spectrum. The variance reduction is consequently lower for low-energy 
speech than during background noise periods, and larger compared to high energy 
speech periods. 

The above described spectral subtraction scheme according to the invention is 
depicted in Figure 4. In Figure 4, a spectral subtraction noise reduction processor 400, 
providing linear convolution, causal-filtering and controlled exponential averaging, is 
shown to include the Bartlett processor 305, the magnitude squared processor 320, the 
voice activity detector 330, the block-wise averaging device 340, the low order gain 
computation processor 350, the gain phase processor 355, the interpolation processor 
356, the multiplier 360, the inverse fast Fourier transform processor 370 and the 
overlap and add processor 380 of the system 300 of Figure 3, as well as an averaging 
control processor 445, an exponential averaging processor 446 and an optional fixed 
FIR post filter 465. 

As shown, the noisy speech input signal is coupled to an input of the Bartlett 
processor 305 and to an input of the fast Fourier transform processor 310. An output 
of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 
320, and an output of the fast Fourier transform processor 310 is coupled to a first 
input of the multiplier 360. An output of the magnitude squared processor 320 is 
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coupled to a first contact of the switch 325, to a first input of the low order gain 
computation processor 350 and to a first input of the averaging control processor 445. 

A control output of the voice activity detector 330 is coupled to a throw input of 

theswftch3T5/and™ 
5 block-wise averaging device 340. An output of the block-wise averaging device 340 is 
coupled to a second input of the low order gain computation processor 350 and to a 
second input of the averaging controller 445. An output of the low order gain 
computation processor 350 is coupled to a signal input of the exponential averaging 
processor 446, and an output of the averaging controller 445 is coupled to a control 
10 input of the exponential averaging processor 446. 

An output of the exponential averaging processor 446 is coupled to an input of 
the gain phase processor 355 , and an output of the gain phase processor 355 is coupled 
to an input of the interpolation processor 356. An output of the interpolation processor 
356 is coupled to a second input of the multiplier 360, and an output of the optional 
15 fixed FIR post filter 465 is coupled to a third input of the multiplier 360. An output of 
the multiplier 360 is coupled to an input of the inverse fast Fourier transform processor 
370, and an output of the inverse fast Fourier transform processor 370 is coupled to an 
input of the overlap and add processor 380. An output of the overlap and add 
processor 380 provides a clean speech signal for the exemplary system 400. 
20 Ib operation, me spectral subtraction noise reduction processor 400 according to 

the invention processes the incoming noisy speech signal, using the linear convolution, 
causal filtering and controlled exponential averaging algorithm described above, to 
provide the improved, reduced-noise speech signal. As with the embodiment of 
Figure 3, the various components of Figure 4 can be implemented using any known 
25 digital signal processing technology, including a general purpose computer, a collection 
of integrated circuits and/or application specific integrated circuitry (ASIC). 

Note that, according to exemplary embodiments, since the sum of the frame 
length L and the sub-block length M are chosen to be shorter than N-l, the extra fixed 
FIR filter 465 of length J < N - 1 - L - M can be added as shown in Figure 4 . The 
30 post filter 465 is applied by multiplying the interpolated impulse response of the filter 
with the signal spectrum as shown. The interpolation to a length N is performed by 
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zero padding of the filter and employing an N-long FFT. This post filter 465 can be 
used to filter out the telephone bandwidth or a constant tonal component. 
Alternatively, the functionality of the post filter 465 can be included directly within the 
gain function. 

5 The parameters of the above described algorithm are set in practice based upon 

the particular application in which the algorithm is implemented. By way of example, 
parameter selection is described hereinafter in the context of a GSM mobile telephone. 

First, based on the GSM specification, the frame length L is set to 160 samples, 
which provides 20 ms frames. Other choices of L can be used in other systems. 
10 However, it should be noted that an increment in the frame length L corresponds to an 
increment in delay. The sub-block length M (e.g. , the periodogram length for the 
Bartlett processor) is made small to provide increased variance reduction M. Since an 
FFT is used to compute the periodograms, the length M can be set conveniently to a 
power of two. The frequency resolution is then determined as: 

(30) 

15 The GSM system sample rate is 8000 Hz. Thus a length M = 16, M = 32 and 

M = 64 gives a frequency resolution of 500 Hz, 250 Hz and 125 Hz, respectively. 

In order to use the above techniques of spectral subtraction in a system where 
the noise is variable, such as in a mobile telephone, the present invention utilizes a two 
microphone system. The two microphone system is illustrated in Figure 5, where 582 

20 is a mobile telephone, 584 is a near-mouth microphone, and 586 is a far-mouth 

microphone. When a far-mouth microphone is used in conjunction with a near-mouth 
microphone, it is possible to handle non-stationary background noise as long as the 
noise spectrum can continuously be estimated from a single block of input samples. 

The far-mouth microphone 586, in addition to picking up the background noise, 

25 also picks up the speaker's voice, albeit at a lower level than the near-mouth 

microphone 584. To enhance the noise estimate, a spectral subtraction stage is used to 
suppress the speech in the far-mouth microphone 586 signal. To be able to enhance the 
noise estimate, a rough speech estimate is formed with another spectral subtraction 
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15 



stage from the near-mouth signal. Finally, a third spectral subtraction stage is used to 
enhance the near-mouth signal by filtering out the enhanced background noise. 

A potential problem with the above technique is the need to make low variance 

estimates of thelfii^ 
5 only be formed from a short block of data samples. In order to reduce the variability 

of the gain function, the single microphone spectral subtraction algorithm discussed 
above is used. By doing so, this method reduces the variability of the gain function by 
using Barfietfs spectrum estimation method to reduce the variance. The frequency 
resolution is also reduced by this method but this property is used to make a causal true 
10 linear convolution. In an exemplary embodiment of the present invention, the 

variability of the gain function is further reduced by adaptive averaging, controlled by 
a discrepancy measure between the noise and noisy speech spectrum estimates. 

In the two microphone system of the present invention, as illustrated in Figure 
6, there are two signals: the continuous signal from the near-mouth microphone 584, 
where the speech is dominating, x s (n); and the continuous signal from the far-mouth 
microphone 586, where the noise is more dominant, x„(/i). The signal from the near- 
mouth microphone 584 is provided to an input of a buffer 689 where it is broken down 
into blocks x£):- In an exemplary embodiment of the present invention, buffer 689 is 
also a speech encoder. The signal from the far-mouth microphone 586 is provided to 
20 an input of a buffer 687 where it is broken down into blocks x n (i). Both buffers 687 

and 689 can also include additional signal processing such as an echo canceller in order 
to further enhance the performance of the present invention. An analog to digital 
(A/D) converter (not shown) converts an analog signal, derived from the microphones 
584, 586, to a digital signal so that it may be processed by the spectral subtraction 
25 stages of the present invention. The A/D converter may be present either prior to or 
following the buffers 687, 689. 

The first spectral subtraction stage 601 has as its input, a block of the near- 
mouth signal, x/i), and an estimate of the noise from the previous frame, YJf.i - 1). 
The estimate of noise from the previous frame is produced by coupling the output of 
30 the second spectral subtraction stage 602 to the input of a delay circuit 688. The 
output of the delay circuit 688 is coupled to the first spectral subtraction stage 601. 
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This first spectral subtraction stage is used to make a rough estimate of the speech, 
Y r (f,i). The output of the first spectral subtraction stage 601 is supplied to the second 
spectral subtraction stage 602 which uses this estimate {Y r (f, /)) and a block of the far- 
mouth signal, x n (i) to estimate the noise spectrum for the current frame, Y n (f,i). 
5 Finally, the output of the second spectral subtraction stage 602 is supplied to the third 
spectral subtraction stage 603 which uses the current noise spectrum estimate, Y n (f,i), 
and a block of the near-mouth signal, x s (i), to estimate the noise reduced speech, Y s (fJ). 
The output of the third spectral subtraction stage 603 is coupled to an input of the 
inverse fast Fourier transform processor 670, and an output of the inverse fast Fourier 

10 transform processor 670 is coupled to an input of the overlap and add processor 680. 
The output of the overlap and add processor 680 provides a clean speech signal as an 
output from the exemplary system 600. 

In an exemplary embodiment of the present invention, each spectral subtraction 
stage 601-603 has a parameter which controls the size of the subtraction. This 

15 parameter is preferably set differently depending on the input SNR of the microphones 
and the method of noise reduction being employed. In addition, in a further exemplary 
embodiment of the present invention, a controller 604 is used to dynamically set the 
parameters for each of the spectral subtraction stages 601-603 for further accuracy in a 
variable noisy environment. In addition, since the far-mouth microphone signal is used 

20 to estimate the noise spectrum which will be subtracted from the near-mouth noisy 
speech spectrum, performance of the present invention will be increased when the 
background noise spectrum has the same characteristics in both microphones. That is, 
for example, when using a directional near-mouth microphone, the background 
characteristics are different when compared to an omni-directional far-mouth 

25 microphone. To compensate for the differences in this case, one or both of the 

microphone signals should be filtered in order to reduce the differences of the spectra. 

In an exemplary embodiment of the present invention, it is desirable to keep the 
delay as low as possible in telephone communications to prevent disturbing echoes and 
unnatural pauses. When the signal block length is matched with the mobile telephone 

30 system's voice encoder block length, the present invention uses the-same block of 

samples as the voice encoder. Thereby, no extra delay is introduced for the buffering 
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of the signal block. The introduced delay is therefore only the computation time of the 
noise reduction of the present invention plus the group delay of the gain function 
filtering in the last spectral subtraction stage. As illustrated in the third stage, a 
minimum phase can be imposed on the amplitude gain function which gives a short 
5 delay under the constraint of causal filtering. 

Since the present invention uses two microphones, it is no longer necessary to 
use VAD 330, switch 325, and average block 340 as illustrated with respect to the 
single microphone use of the spectral subtraction in Figures 3 and 4. That is, the far- 
mouth microphone can be used to provide a constant noise signal during both voice and 
10 non-voice time periods. In addition, IFFT 370 and the overlap and add circuit 380 
have been moved to the final output stage as illustrated as 670 and 680 in Figure 6. 

The above described spectral subtraction stages used in the dual microphone 
implementation may each be implemented as depicted in Figure 7. In Figure 7, a 
spectral subtraction stage 700, providing linear convolution, causal-filtering and 
15 controlled exponential averaging, is shown to include the Bartlett processor 705, the 
frequency decimator 722, the low order gain computation processor 750, the gain 
, phase processor and the interpolation processor 755/756, and the multiplier 760. 

As shown, the noisy speech input signal, X«(0. is coupled to an input of the 

Bartlett processor 705 and to an input of the fast Fourier transform processor 710. The 
20 notation X o (0 is used to represent X a (i) or X s (0 which are provided to the inputs of 

spectral subtraction stages 601-603 as illustrated in Figure 6. The amplitude spectrum 
of the unwanted signal, Y, M (f,i), Y^if.i) with length N, is coupled to an input of the 
frequency decimator 722. The notation Y^(f,i) is used to represent Y n (fM), Y t (f,i), or 
y n (/;0. An output of the frequency decimator 722 is the amplitude spectrum of Y^(f,i) 
25 having length M, where M < N. In addition the frequency decimator 722 reduces the 
variance of the output amplitude spectrum as compared to the input amplitude 
spectrum. An amplitude spectrum output of the Bartlett processor 705 and an 
amplitude spectrum output of the frequency decimator 722 are coupled to inputs of the 
low order gain computation processor 750. The output of the fast Fourier transform 
30 processor 710 is coupled to a first input of the multiplier 760. 
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The output of the low order gain computation processor 750 is coupled to a 
signal input of an optional exponential averaging processor 746. An output of the 
exponential averaging processor 746 is coupled to an input of the gain phase and 
interpolation processor 755/756. An output of processor 755/756 is coupled to a 
second input of the multiplier 760. The filtered spectrum Y+(f,i) is thus the output of 
the multiplier 760, where the notation Y*(f,i) is used to represent Y r (f,i), Y n (f,i), or 
Y s (fJ). The gain function used in Figure 7 is: 



(31) 



where \X^ M (f 9 i)\ is the output of Bartlett processor 705, \Y ( . )M (f,i)\ is the output of the 
frequency decimator 722, a is a spectrum exponent, k ( . ) is the subtraction factor 
controlling the amount of suppression employed for a particular spectral subtraction 
stage. The gain function can be optionally adaptively averaged. This gain function 
corresponds to a non-causal time-variating filter. One way to obtain a causal filter is to 
impose a minimum phase. An alternate way of obtaining a causal filter is to impose a 
linear phase. To obtain a gain function GJf,i) with the same number of FFT bins as 
the input block X^ Jf^i), the gain function is interpolated, G mN (f,i). The gain 
function, G m ^f t i), now corresponds to a causal linear filter with length M. By using 
conventional FFT filtering, an output signal without periodicity effects can be obtained. 

In operation, the spectral subtraction stage 700 according to the invention 
processes the incoming noisy speech signal, using the linear convolution, causal 
filtering and controlled exponential averaging algorithm described above, to provide 
the improved, reduced-noise speech signal. As with the embodiment of Figures 3 and 
4, the various components of Figures 6-7 can be implemented using any known digital 
signal processing technology, including a general purpose computer, a collection of 
integrated circuits and/or application specific integrated circuitry (ASIC). 

As discussed above, £ C) is the subtraction factor controlling the amount of 
suppression employed for a particular spectral subtraction stage. In one embodiment of 
the present invention, each of the values of Jfc ( ., (i.e., k u £ 2 , k s where k x is used by 
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spectral subtraction stage 601, * 2 is used by spectral subtraction stage 602, and k, is 
used by spectral subtraction stage 603) is dynamically controlled by the controller 604 
to compensate for the dynamic nature of the input signals. The controller 604 receives, 
as an input, the gain functions mttfc from the first- and second- spectral subtraction 
5 stages 601, 602, respectively. In addition, the controller receives x s (z) and x n (Q from 
buffers 689, 687, respectively. Each of the first, second, and third spectral subtraction 
stages receive, as an input, a control signal from the controller mdicathrg the present 
value of the respective subtraction factor. The values of * ( . > change according to the 
sound environment. That is, various factors decide the appropriate level of suppression 
10 of the background noise and also compensate for the different energy levels of both the 
background noise and the'speech signal in the two microphone signals. 

The block-wise energy levels in the microphone signals are denoted by p liX (i) 
sm dp 2x (i) for the near-mouth microphone 584 and the far-mouth microphone 586 
signal! respectively. The energy of the speech signal in the near-mouth microphone 
15 584 and the far-mouth microphone 586 signals are respectively denoted by p jji) and 
p 2s (i) and the corresponding background noise signals energy are denoted by P] Ji) and 

P2,n(i)- 

The subtraction factor is set to the level where the first spectral subtraction 
function, SS», results in a speech signal with a low noise level. The parameter *, must 
20 also compensate for energy level differences of the background signal in the two 

microphone signals. When the background energy level in the far-mouth microphone 
586 signal is greater than the level in the near-mouth microphone 584, k l should 
decrease, hence 



25 



* « ^1 02) 
1 PJ® 



The second spectral subtraction function, SS 2 , is used to enhance the noise 
signal in the far-mouth microphone 586 signal. The subtraction factor k 2 controls how 
much of the speech signal should be suppressed. Since the speech signal in the near- 
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mouth microphone 584 signal has a higher energy level than in the secondary 
microphone signal k 2 must compensate for this, hence 




(33) 



The resulting noise estimate should contain a highly reduced speech signal, preferably 
no speech signal at all, since remains of the desired speech signal will be 
disadvantageous to the speech enhancement procedure and will thus lower the quality 



10 



15 



20 



of the output. 

The third spectral subtraction function, SS 3 , is controlled in a similar manner as 

SS lB 

A number of different exemplary control procedures for detennining the values 
of the subtraction factors are described below. Each procedure is described as 
controlling all the subtraction factors, however, one skilled in the art will recognize 
that multiple control procedures can be used to jointly derive a subtraction factor level. 
In addition, different control procedures can be used for the determination of each 
subtraction factor. 

The first exemplary control procedure makes use of the power or magnitude of 
the input microphone spectra. The parameters p ltX (i), p % Jj), p Us (i), p 2rS (i), Pl n (i), and 
p 2jn (i) are defined as above or replaced by the corresponding magnitude estimates. 

This procedure is built on the idea of adjusting the energy levels of the speech 
and noise by means of the subtraction factors. By using the spectral subtraction 
equation it is possible to derive suitable factors so the energy in the two microphones is 
leveled. 

The subtraction factor in the speech pre-processing spectral subtraction can be 
derived from SSj equations 



l.L\N 



,<f,i). 



(34) 
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(35) 



giving 



PJi) - 



l -* t (0 



(36) 



In equation (36) a = 1 and the spectra has been replaced by the energy measures, 
p y( i ) and p| , / i -l ) of the output from the speech and noise pre-processors. 
Solving the equation for the direct subtraction factor gives 



Jfc x (0 * 



(37) 



To reduce the iterative coupling in the calculation the equation is restated with 
the mean of the gain functions 



fe i(') T^F — 77~T\ i 



(38) 



10 



where f, is a fix multiplication factor setting the overall noise reduction level and 

8 (0 - - Eg im (».0, < 39 > 
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^ Af-1 



iK£ m =0 



Equation (38) is dependent on the ratio of the noise levels in the two 
microphone signals. Besides t u equation (38) only compensates for differences in 
energy between the two microphones. The subtraction factor £ ( i ) increases during 
speech periods. This is suitable behavior since a stronger noise reduction is needed 
during these periods. 

To reduce the variability and to limit £ fl to a reasonable range, the averaged 
subtraction factor is introduced 



Pi +16,-0 



max kJ (i), k i (k-6 i )>max k] (i) 

~A>> min u < < m<*x kJ (0 (41) 

min ]j9 £" (i-8,)< rain, 

kl 1 1 A: 



where pj-f 1 is the number of averaged subtraction factors, min kl is the minimum 
allowed , and max kl {i) is the maximum allowed F fl calculated by 

max kl (i) =min([fc i (0,^ 1 (z - 1)...,*^*- A^]) +7^ (42) 



The maximum mmc kl (i) is used to prevent the subtraction level during speech periods 
from becoming too high, and to decrease the fluctuations of the gain function. The 
maximum is set by an offset, r l9 to the minimum k (z) found during the last A l frames. 
Parameter A } should be large enough so it will cover part of the last "noise only" 
period. The averaged subtraction factor is then used in the spectral subtraction 
equation (35) instead of the direct subtraction factor k x . 
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The parameter Jfc|. (f,i) is derived in the same way as *, (z) except that it is 
calculated for each frequency bin separately followed by a smoothing in frequency. 



3 pjf*W> 3 



i ' pj 

k.(f,i) = -—rE 

3 P, + l6,=0 



max k3 (i), k 3 (f,i-6 3 )>max k3 (.i) 

k 3 (f,iS 3 ), min k3 <k 3 (f,i-b 3 )<max k3 (.i), (44) 
win M , £3 (A i -° 3 ) < mi ' n A 3 



(i)=min([Jfc 3 (f,0,*3(f.'-l)-.* 3 ^ I '- A 3 )] +r 3' f e[0,l,....M-l] (45) 



5 where k\ (f, i) is the subtraction factor at discrete frequencies fe [0, 1,..., M-l]. 
Further, Pl Jf, i) and Pl Jf, i) are the power or magnitude of respective input 
microphone signals at individual frequency bins. The transfer function between the 
two microphone signals is frequency dependent. This frequency dependence is varying 
over time due to movement of, for example, the mobile phone and how it is held. A 

10 frequency dependence can also be used for the two first subtraction factors if desired. 
However, this increases computational complexity. 

Even though the subtraction factor is calculated in each frequency band, it is 
smoothed over frequencies to reduce its variability giving 



V-l 



1 ^ r,_J!f , - , (46) 



K(fj)=4- r £ •fc.ay+vio.o 



15 
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where V is the odd length of the rectangular smoothing window and \f+v] q is an 
interval restriction of the frequency at 0 respectively M. The subtraction factor 
Jfcj { f , i ) » smoothed in both frequency and frame directions, is used in the third 
spectral subtraction equation instead of the direct subtraction factor. 

The noise pre-processor subtraction factor is different since it decides the 
amount of speech signal that should be removed from the far-mouth microphone 586 
signal. It can be derived from the spectral subtraction equations 

Y nJf>» = G 2. M W <K <> ^X.^ *>' (47) 



1-k 



(48) 



givmg 



i-k 2 (i) 



(49) 



In equation (49), the spectra has been replaced by the energy measures and a = 1. 
Solving the equation for the direct subtraction factor k 2 (i) gives 



(50) 



where an overall speech reduction level, r 2 , is also introduced. By restating equation 
(50) without explicifly using the energy of the pre-processed signals, a more robust 
control is obtained: 
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p 2 , x mi-8 2M a-i)) f 



f.(0- 2 " -r^rr (51) 



Equation (51) depends on the ratio between the speech levels in the two microphone 
signals. 

To reduce the variability and to limit £j to an allowed range, an exponentially 
5 averaged subtraction factor is introduced 



max k2 (i), k z (i)>max k2 

fc 2 (i), min k2 < fe 2 (i) < max k2 



(52) 



min k2 , k\(i)<min 



k2 



where p 2 is the exponential averaging constant, max a is the maximum allowed *j and 
min n is the minimum allowed k { . The averaged subtraction factor is then used in the 
10 spectral subtraction equation (48) instead of the direct subtraction factor * ( . 

An alternative exemplary control procedure makes use of the correlation 
between the two input microphone signals. The input time signal samples are denoted 
as jq(n) and x 2 (n) for the near-mouth microphone 584 and far-mouth microphone 596, 
respectively. 

15 The correlation between the signals is dependent on the degree of similarity 

between the signals. Generally, the correlation is higher when the user's voice is 
present. Point-formed background noise sources may have the same effect on the 
correlation. The correlation matrix is defined as 



R xJ ^ = £ x^n+D-x^n) (53) 

n 



20 
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on a signal of infinite duration. In practice, this can be approximated by using only a 
time-window of the signals 



I X 

R xI, X 2&-J-^Xl(i)X 2 (i) 



(54) 



5 where i is the frame number, P 1 is the variance of the primary signal for this frame and 



x^n-UJ x^n-UJ ... x^n - U x +JST- 1) 



(55) 



and 



x 2 (i)=[x(n) x(n-l) ... x(n-K)]. 



(56) 



10 



The parameter U is the set of lags of calculated correlation values and K is the time- 
window duration in samples. 

The estimated correlation measure jr^ j is used in the calculation of a new 
correlation energy measure 



15 



(57) 



where Q defines a set of integers. The use of the square function, as shown in 
equation (57) is not essential to the invention; other even functions can alternatively be 
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used on the correlation samples. The y(i) measure is only calculated over the present 
frame. To improve quality and reduce the fluctuation of the measure, an averaged 
measure is used 

Y(z") = y(i - 1) -a + Y(0 '(l - °0 ( s8 ) 

The exponential averaging constant a is set to correspond to an average over less than 
4 frames. 

Finally, the subtraction factors can be calculated from the averaged correlation 
energy measures 

fc 1 (0=(l-Y(0)-' 1 + '- 1 (59) 



k 2 (i)=y(i)-t 2 + r 2 (60) 



fc 3 (0=(l-Y(0)-' 3 + '-3 (61) 



where t u h and t 3 are scalar multiplication factors to adjust the amount of subtraction 
15 that is generally used. The parameters r„ r 2 and r 3 are additive to the correlation 
energy measure setting a generally lower or higher level of subtraction. 

The adaptive frame-per-frame calculated subtraction factors £ x (0, k 2 (i) and & 3 (i) 
are used in the spectral subtraction equations. 

Another alternative exemplary control procedure uses a fixed level of the 
20 subtraction factors. This means that each subtraction factor is set to a level that 
generally works for a large number of environments. 



BNSDOCID: <WO 015632BA1_I_> 



WO 01/56328 



PCT/EP01/00468 



-32- 

In other alternative embodiments of the present invention, subtraction factors 
can be derived from other data not discussed above. For example, the subtraction 
factors can be dynamically generated from information derived from the two input 
microphone signals. Alternatively, information for dynamically generating the 
5 subtraction factors can be obtained from other sensors, such as those associated with a 
vehicle hands free accessory, an office hands free-kit, or a portable hands free cable. 
Still other sources of information for generating the subtraction factors include, but are 
not limited to, sensors for measuring the distance to the user, and information derived 
from user or device settings. 

10 In summary, the present invention provides improved methods and apparatuses 

for dual microphone spectral subtraction using linear convolution, causal filtering 
and/or controlled exponential averaging of the gain function. One skilled in the art 
will readily recognize that the present invention can enhance the quality of any audio 
signal such as music, and the like, and is not limited to only voice or speech audio 

15 signals. The exemplary methods handle non-stationary background noises, since the 
present invention does not rely on measuring the noise on only noise-only periods. In 
addition/ during short duration stationary background noises, the speech quality is also 
improved since background noise can be estimated during both noise-only and speech 
periods. Furthermore, the present invention can be used with or without directional 

20 microphones, and each microphone can be of a different type. In addition, the 

magnitude of the noise reduction can be adjusted to an appropriate level to adjust for a 
particular desired speech quality. 

Those skilled in the art will appreciate that the present invention is not limited 
to the specific exemplary embodiments which have been described herein for purposes 

25 of illustration and that numerous alternative embodiments are also contemplated. For 
example, though the invention has been described in the context of mobile 
communications applications, those skilled in the art will appreciate that the teachings' 
of the invention are equally applicable in any signal processing application in which it 
is desirable to remove a particular signal component. The scope of the invention is 

30 therefore defined by the claims which are appended hereto, rather than the foregoing 
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description, and all equivalents which are consistent with the meaning of the claims are 
intended to be embraced therein. 
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We Claim . 

1. A noise reduction system, comprising: 

a first spectral subtraction processor configured to filter a first signal to provide 
5 a first noise reduced output signal, wherein an amount of subtraction performed by the 
first spectral subtraction processor is controlled by a first subtraction factor, k^ 

a second spectral subtraction processor configured to filter a second signal to 
provide a noise estimate output signal, wherein an amount of subtraction performed by 
the second spectral subtraction processor is controlled by a second subtraction factor, 
10 ^; 

a third spectral subtraction processor configured" to filter said first signal as a 
function of said noise estimate output signal, wherein an amount of subtraction 
performed by the third spectral subtraction processor is controlled by a third 
subtraction factor, k 3 ; and 
15 a controller for dynamically determining at least one of k u k 2 , and k 3 during 

operation of the noise reduction system. 

2. The noise reduction system of claim 1, wherein the controller estimates a 
correlation between the first signal and the second signal. 

20 

3. The noise reduction system of claim 2, wherein the controller derives at least 
one of the first, second, and third subtraction factors, k u ^, and k 3 , based on the 
correlation between the first signal and the second signal. 

25 4. The noise reduction system of claim 2, wherein the controller estimates a set of 
correlation samples of the first signal and the second signal and computes a correlation 
measurement as a sum of squares of the set of correlation samples. 

5. The noise reduction system of claim 2, wherein the controller estimates a set of 
30 correlation samples of the first signal and the second signal and computes a correlation 
measurement as a sum of an even function of the set of correlation samples. 
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6. The noise reduction system of claim 4, wherein at least one of the subtraction 
factors, h, and k 3 , is derived from the correlation measurement of the set of 
correlation samples. 

5 7. The noise reduction system of claim 5, wherein at least one of the subtraction 
factors, k„ h, and k 3 , is derived from the correlation measurement of the set of 
correlation samples. 

8. The noise reduction system of claim3, wherein at least one of the subtraction 
10 factors, k u h, and k 3 , is smoothed over time. 

9. The noise reduction system of claim 6, wherein at least one of the subtraction 
factors, k u h, and k 3 , is smoothed over time. 

15 10. The noise reduction system of claim 7, wherein at least one of the subtraction 
factors, ku h, and k 3 , is smoothed over time. 



reduction system of claim 2, wherein fe„ k 2 , and k 3 are derived as 



11. The noise 

k 2 {i)=y{i)-t z +r 2 
fc 3 (0=(l-Y(0)^ 3 + '-3 



20 

where h , t 2 , t 3 are scalar multiplication factors, r lt r 2 , r 3 are additive factors, and Y (0 
an averaged square correlation sum of the first signal and the second signal. 
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12. The noise reduction system of claim 1, wherein the controller substantially 
equalizes energy levels of the first signal and the second signal. 

13. The noise reduction system of claim 1, wherein the controller substantially 
5 equalizes magnitude levels of the first signal and the second signal. 

14. The noise reduction system of claim 1, wherein the controller derives at least 
one of the first, second, and third subtraction factors from a ratio of noise signal 
measurement of the first signal and a noise signal measurement of the second signal. 

10 

15. The noise reduction system of claim 1, wherein the controller derives at least 
one of the first, second, and third subtraction factors from a ratio of desired signal 
measurement of the second signal and the desired signal measurement of the first 
signal. 

15 

16. The noise reduction system of claim 14, wherein each of the noise signal 
measurements is an energy measurement. 

17. The noise reduction system of claim 14, wherein each of the noise signal 
20 measurements is a magnitude measurement. 

18. The noise reduction system of claim 15, wherein each of the desired signal 
measurements is an energy measurement. 

25 19. The noise reduction system of claim 15, wherein each of the desired signal 
measurements is a magnitude measurement. 

20. The noise reduction system of claim 15, wherein the desired signal is a speech 
signal. 

30 
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21 The noise reduction system of claim 14, wherein the controller computes at 
least one of a first relative positive measurement based on a first gain function and a 
second relative positive measurement based on a second gain function. 

5 22. The noise reduction system of claim 15, wherein the controller computes at 
least one of a first relative positive measurement based on a first gain function, and a 
second relative positive measurement based on a second gain function. 

23 . The noise reduction system of claim 21, wherein the noise signal measurement 
10 is derived from at least one of the first signal and the second signal, and at least one of 

the first relative positive measurement and the second relative positive measurement, 
respectively. 

24. The noise reduction system of claim 22, wherein the desired signal 

15 measurement is derived from at least one of the first signal and the second signal, and 
at least one of the first relative positive measurement and the second relative positive 
measurement, respectively. 

25 . The noise reduction system of claim 14, wherein a frequency dependent 
20 weighting function, performed by at least one of the first and second spectral 

subtraction processors, is used to derive at least one of a first and second frequency 
dependent positive measurement. 

26. The noise reduction system of claim 15, wherein a frequency dependent 
25 weighting function, performed by at least one of the first and second spectral 

subtraction processors, is used to derive at least one of a first and second frequency 
dependent positive measurement. 

27 . The noise reduction system of claim 25 , wherein the noise signal measurement 
30 is derived from at least one of the first signal and the second signal, and at least one of 
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the first frequency dependent positive measurement and the second frequency 
dependent positive measurement. 

28. The noise reduction system of claim 26, wherein the noise signal measurement 
5 is derived from at least one of the first signal and the second signal, and at least one of 

the first frequency dependent positive measurement and the second frequency 
dependent positive measurement. 

29. The noise reduction system of claim 14, wherein k x , k^, and £ 3 are derived as: 



10 



*,(0 = 



• t. 



k 2 d) = 



15 
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where 



where p u (i) is an energy level of the first signal and p 2 Ji) is an energy level of 
5 the second signal, t u t 2 , t 3 are scalar multiplication factors, G, is a first gain function, 
and G 2 is a second gain function. 

30. The noise reduction system of claim 15, wherein k lt k 2 , and k 3 are derived as: 



* a (0 



t 



fc 2 (l > /-x- / »\ h 



10 



'L- 



kjf,i)= — — • 



- 3' 
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where 




£ G (m,Q, 

m =0 




10 



15 



20 



where p x JS) is a magnitude of the first signal and p 2 Jj) is a magnitude level of 
the second signal, t l9 t 2 , t 3 are scalar multiplication factors, G x is a first gain function, 
and G 2 is a second gain function. 

31. A method for processing a noisy input signal and a noise signal to provide a 
noise reduced output signal, comprising the steps of: 

(a) using spectral subtraction to filter said noisy input signal to provide a first 
noise reduced output signal, wherein an amount of subtraction performed is controlled 
by a first subtraction factor, k x \ 

(b) using spectral subtraction to filter said noise signal to provide a noise 
estimate output signal, wherein an amount of subtraction performed is controlled by a 
second subtraction factor, £ 2 ; and 

(c) using spectral subtraction to filter said noisy input signal as a function of 
said noise estimate output signal, wherein an amount of subtraction is controlled by a 
third subtraction factor, k 39 

wherein at least one of the first, second, and third subtraction factors is 
dynamically determined during the processing of the noisy input signal and the noise 
signal. 

32. The method of claim 31, wherein a correlation between the first signal and the 
second signal is estimated. 
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33. The method of claim 32, wherein at least one of the first, second, and third 
subtraction factors, k u k 2 , and k 3i is based on the correlation between the first signal 
and the second signal. 

5 34. The method of claim 32, wherein a set of correlation samples of the first signal 
and the second signal are estimated and correlation measurement as a sum of squares of 
the set of correlation samples is computed. 

35. The method of claim 32, wherein a set of correlation samples of the first signal 
10 and the second signal are estimated and a correlation measurement as a sum of an even 

function of the set of correlation samples is computed. 

36. The method of claim 34, wherein at least one of the subtraction factors, k u k 2 , 
and k 3 , is derived from the correlation measurement of the set of correlation samples. 

15 

37. The method of claim 35, wherein at least one of the subtraction factors, k l9 k^, 
and k 3 , is derived from the correlation measurement of the set of correlation samples. 

38. The method of claim 33, wherein at least one of the subtraction factors, k u k 2 , 
20 and fc 3 , is smoothed over time. 

39. The method of claim 36, wherein at least one of the subtraction factors, k l9 k 2 , 
and k 3 , is smoothed over time. 

25 40. The method of claim 37, wherein at least one of the subtraction factions, k u k 2 , 
k 3 , is smoother over time. 

41. The method of claim 32, wherein k u k 29 and k 3 are derived as 

k 1 (0=(l"Y(0)^ 1 +r 1 
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* a «y*y(o-* 2 +r a 



VO=(l-Y(0)^ 3 +r 3 



where t Jf t 2 , t 3 are scalar multiplication factors, r } , r 2 , r 3 are additive factors, and y(i) is 
an averaged squared correlation sum of the first signal and the second signal. 

5 42. The method of claim 3 1 , wherein energy levels of the first signal and the 
second signal are substantially equalized. 

43. The method of claim 31, wherein magnitude levels of the first signal and the 
second signal are substantially equalized. 

10 

44. The method of claim 31, wherein at least one of the first, second, and third 
subtraction factors is derived from a ratio of noise signal measurement of the first 
signal and a noise signal measurement of the second signal. 

15 45. The method of claim 3 1 , wherein at least one of the first, second, and third 

subtraction factors is derived from a ratio of desired signal measurement of the second 
signal and the desired signal measurement of the first signal. 

46. The method of claim 44, wherein each of the noise signal measurements is an 
20 energy measurement. 

47. The method of claim 44, wherein each of the noise signal measurements is a 
magnitude measurement. 
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48. The method of claim 45, wherein each of the desired signal measurements is an 
energy measurement. 

49. The method of claim 45, wherein each of Wd^irM'sipa'meaisiHSHSilstis"a" 
5 magnitude measurement. 

50. The method of claim 45, wherein the desired signal is a speech signal. 

5 1 . The method of claim 45 , wherein at least one of a first relative positive 

10 measurement based on a first gain function and a second relative positive measurement 
based on a second gain function is computed. 

52. The method of claim 46, wherein at least one of a first relative positive 
measurement based on a first gain function and a second relative positive measurement 

15 based on a second gain function is computed. 

53 . The method of claim 5 1 , wherein the noise signal measurement is derived from 
at least one of the first signal and the second signal, and at least one of the first relative 
positive measurement and the second relative positive measurement, respectively. 



20 



25 



54 . The method of claim 52, wherein the desired signal measurement is derived 
from at least one of the first signal and the second signal, and at least one of the first 
relative positive measurement and the second relative positive measurement, 
respectively. 

55 . The method of claim 44, wherein a frequency dependent weighting function is 
used to derive at least one of a first and second frequency dependent positive 
measurement. 
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56. The method of claim 45, wherein a frequency dependent weighting function is 
used to derive at least one of a first and second frequency dependent positive 
measurement. 

5 57. The method of claim 55, wherein the noise signal measurement is derived from 
at least one of the first signal and the second signal, and at least one of the first 
frequency dependent positive measurement and the second frequency dependent 
positive measurement. 

10 58. The method of claim 56, wherein the noise signal measurement is derived from 
at least one of the first signal and the second signal, and at least one of the first 
frequency dependent positive measurement and the second frequency dependent 
positive measurement. 

15 59. The method of claim 44, wherein k lt k^, and k 3 are derived as: 

*i,(0(l-f 1Jtf (*-l)) 



* 2 (0 = 



P 3 J0(i-g, Ji-D) 



•v 



20 
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where 



where p x Ji) is an energy level of the first signal and p 2 ^i) is an energy level of 
5 the second signal, t,, t 2 , t 3 are scalar multiplication factors, G, is a first gain function 
and G 2 is a second gain function. 

60. The method of claim 45, wherein k u k z , and k 3 are derived as: 



kxi) = M ^ - • h 



fc 2 W = T^F — ^ 2 

Pl.x«Sl,M< l > 



10 

it. 



3 pjfWufl* ' 3 ' 



where 

M-l 



J - (i> = m 5 G '•'• ( '"• , ' ) • 
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where /7 lrJC (z) is a magnitude of the first signal and p 2 Jj) is a magnitude level of 
the second signal, t If t 2 , t 3 are scalar multiplication factors, G x is a first gain function 
and G 2 is a second gain function. 
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